Graph Mining under Linguistic Constraints for Exploring Large Texts
نویسندگان
چکیده
In this paper, we propose an approach to explore large texts by highlighting coherent sub-parts. The exploration method relies on a graph representation of the text according to Hoey’s linguistic model which allows the selection and the binding of adjacent and non-adjacent sentences. The main contribution of our work consists in proposing a method based on both Hoey’s linguistic model and a special graph mining technique, called CoHoP mining, to extract coherent sub-parts of the graph representation of the text. We have conducted some experiments on several English texts showing the interest of the proposed approach.
منابع مشابه
Fouille de graphes sous contraintes linguistiques pour l'exploration de grands textes (Graph Mining Under Linguistic Constraints to Explore Large Texts) [in French]
Graph Mining Under Linguistic Constraints to Explore Large Texts In this paper, we propose an approach to explore large texts by highlighting coherent sub-parts. The exploration method relies on a graph representation of the text according to the Hoey linguistic model which allows the selection and the binding of sentences in the graph. Our contribution relates to using graph mining techniques ...
متن کاملTransliteration Mining Using Large Training and Test Sets
Much previous work on Transliteration Mining (TM) was conducted on short parallel snippets using limited training data, and successful methods tended to favor recall. For such methods, increasing training data may impact precision and application on large comparable texts may impact precision and recall. We adapt a state-of-the-art TM technique with the best reported scores on the ACL 2010 NEWS...
متن کاملLearning from Heterogeneous Genomic Data
Mining patterns under many kinds of constraints is a key point to successfully get new knowledge. In this paper, we propose an efficient new algorithm Music-dfs which soundly and completely mines patterns with various constraints from large data and takes into account external data represented by several heterogeneous datasets. Constraints are freely built of a large set of primitives and enabl...
متن کاملShallow vs. Deep Techniques for Handling Linguistic Constraints and Optimisations
An important aspect of many nlg systems is ensuring that all generated texts obey linguistic constraints and are (near-)optimal under linguistic quality measures. Where they are possible, deep techniques can automate the enforcement of linguistic constraints and optimisations. In contrast, shallow techniques require developers to explicitly enforce constraints and optimisations. Deep techniques...
متن کاملRepresentation of texts as complex networks: a mesoscopic approach
Texts are complex structures emerging from an intricate system consisting of syntactical constraints and semantical relationships. While the complete modeling of such structures is impractical owing to the high level of complexity inherent to linguistic constructions, under a limited domain, certain tasks can still be performed. Recently, statistical techniques aiming at analysis of texts, refe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013